Case study on algebraic software methodologies for scientific computing
نویسنده
چکیده
ions. Finally we discuss the implementation of the PDE problem domain concepts. 3.1. Program construction and reasoning In order to build programs we need algorithm and data constructors, confer the slogan Algorithms + Data Structures = Programs [59]. Most of the early programming languages were strictly following the von Neumann machine model, i.e., sequential imperative programming languages. Imperative languages place the burden of organising instruction sequencing (choice, loops) and storage use (variable updates, arrays and pointers) on the programmer. It was soon argued that this was too restricting, and that semantical notions, and hence reasoning tools, for imperative languages were overly complex. Backus proclaimed his FP (functional programming) language as a reaction to this [4]. Functional languages provide the programmer with expressions and recursion as the primary tool for writing algorithms, tuples and recursive types (lists and trees) for organising data. These are less error prone than their imperative counterparts. Unfortunately, in spite of this simplicity, recursively formulated algorithms often have an exponential growth in space and time complexity, while their imperative counterparts may be linear in time and constant in space. Memoisation (caching of computed function values) may help in avoiding unnecessary recomputation, but does not guarantee good usage of storage, and premature purging of the cache may be needed [16]. The lack of execution time efficiency, coupled with lack of storage control, has hitherto proven detrimental to the use of functional programming for high performance computing, in spite of several attempts such as those in [22,54]. The constructive recursive approach with programmer defined explicit data dependency may be a way out of this [14,27,31]. A general belief is that functional languages have a simpler semantics than imperative languages, thus that they are more akin to reasoning and manipulation. However, it was shown already in [43], that with simple syntactic restrictions, imperative languages can be made just as semantically simple. This allows program reasoning and transformation to become just as easy in an imperative context as in a functional context. 266 M. Haveraaen / Case study on algebraic software methodologies for scientific computing 3.2. Abstraction mechanisms Program construction becomes more flexible if it is possible to abstract over algorithms and types. Algorithmic abstraction, i.e., naming algorithms by function symbols, is often called procedural abstraction, or SUBROUTINE in Fortran. Its use has been a great success for scientific computing in the form of large numerical libraries. Type abstraction, i.e., naming data structures as sorts, is likewise useful by itself. Algorithmic and type abstraction taken together gives the notion of data abstraction – abstract data types or, in objectoriented terminology, classes. This couples the constructed models (data structures and algorithms) to the signature (sorts and function symbols) of the domain concepts. With encapsulation [47] data abstraction becomes a means of realising domain concepts as atomic elements in a general purpose programming language, thus tailoring it as a domain specific programming language. How this can be achieved and shown to be correct was demonstrated already in [33,42]. The use of data invariants, properties that are to be invariantly true on the data, is very important for this. The data invariant reduces the state space of the data abstraction, making the operations easier to implement, and making the relationship with the mathematical models of a specification clearer. In some cases hardware may provide an interesting feature, such as parallelism, while a general purpose programming language does not acknowledge it. Then it is normally possible for the user to define a programming language interface to the feature, providing it as an abstract data type. The hardware feature will normally not be generally available, but an implementation of the abstract data type in the programming language can emulate its overall functionality, providing access to the abstraction irrespectively of the hardware platform used. Thus abstractions make it feasible to utilise advanced hardware much quicker than waiting for improvements in compilers or programming languages. Abstract data types lets us abstract hardware features or software constructions, relating them to the sorts and function symbols resulting from of our algebraic analysis of the problem domain. We thus have the ability to get implementations of the domain specific languages we find useful. With such abstractions, a program may be expressed at an arbitrarily high abstraction level. Often the use of (many layers of) abstractions reduce the run-time efficiency of code. Partly because current compilers introduce extra instructions when calling procedures due to an overly complex procedure call semantics, partly because certain short-cuts that are possible on low-level code would cut through non-related high-level abstractions. With syntactic restrictions the semantical notions of a programming langauge may be simplified, see Section 3.1, enabling an algebraic program transformation tool like CodeBoost, described in the accompanying paper [15], to optimise code by taking advantage of high-level properties of source code as well as cutting through abstraction layers to introduce low-level short-cuts in the code and avoid procedure call overhead. 3.3. Data dependencies and parallelisation In order to achieve higher performance for computational modeling, machines with multiple processors have been taken into use. Their utilisation apparently requires either a move away from the von Neumann programming approach by the use of functional languages or by explicit parallel constructs in the code, or compilers that analyse and parallelise sequential imperative code. Functional programming has had some success for advanced parallel programming, see [54] for a collection of approaches and [38] for a recent overview. The use of explicit parallel constructs turns out to be very difficult to program, and is in general discouraged. Letting the compiler analyse the code for dependencies and then generate the parallelism sounds ideal, but turns out to be an NP -complete problem in general [39], although quite a lot may be achieved in practice. Compiler analysis may look at the flow of control (control dependence analysis) giving a coarse grain parallelisation. Or it may look at data dependencies, i.e., how data at one point during program execution depends on data at another point of the program execution, giving rise to fine grain parallelism. Good automatic analysis is only possible for programs with a simple semantical structure. This is typical of Fortran programs, where high performance often is essential, but this information is also clearly visible in functional programming. Many of the more advanced compiler optimisation strategies were discovered for functional programming languages, such as Crystal [13] and later adapted for imperative languages [60]. One step further is to make the data dependencies explicit abstractions in the programs. This approach has been shown in a functional programming context [14, 26,31] and in an imperative context in [12,14]. In general there will be a gap between the data dependencies in a program and the communication patterns of the hardware. This gap has to be filled either by the M. Haveraaen / Case study on algebraic software methodologies for scientific computing 267 compiler, a problem which again is NP -complete in general [39], or closed explicitly by the programmer, as in [14]. Making data dependencies available as an explicit structure has the added benefit that space and time considerations may be fully controlled by the programmer, both for sequential and parallel compilation, breaking the exponential execution time barrier often associated with functional programming. Another important observation is that when using collection oriented abstractions, such as arrays, in sequential code, these collections may be distributed in parallel on processors [8]. This is the basis for data parallel programming [48], which has been coupled with abstraction oriented programming in the accompany-ion oriented programming in the accompanying paper [28], giving direct access to this hardware feature as a data abstraction. 3.4. Implementing the coordinate free language Showing that the domain oriented concepts can be implemented is a final assurance of their usefulness. It guarantees that the concepts actually are computable, and thus will be useful in the solution of the problems. To check that our coordinate free language is realisable we will sketch the implementation of two of the key abstractions, the scalar fields and the tensors.ions, the scalar fields and the tensors. Numerical discretisation methods (finite difference, finite element, etc.) make it possible to represent the scalar fields over an infinite set M, the manifold, by a finite approximation. Typically we store data values for carefully selected grid points in large array data structures. The discretisation has to provide ring operations and partial differentiation operations by performing computations on the stored data. The tensor is where coordinate systems are handled and the more advanced differentiation operations are implemented. Tensors are typically represented as multidimensional arrays, with appropriate, well-known algorithms for the tensor operations. These algorithms only require that the array elements are ring structures. These models can easily be built using the standard program constructors. This is well-known from the use of the traditional, coordinate based language for the PDE domain, which only uses the basic types and constructors of programming languages. With data abstraction we may couple this together to create theion we may couple this together to create the coordinate free language. 4. Programming in the large: software architectures In our short overview we have sketched an analysis of the PDE domain that provided us with PDE domain specific languages, the traditional coordinate based language and the coordinate free language, with algebraic specifications of the concepts. The languages were shown adequate to formulate interesting problems from the domain (Section 2.3). We have also sketched that the concepts of these domain languages may be implemented using standard data structures and algorithms. Here we will study how to organise the concepts of the more abstract coordinate free language as a software library, i.e., study software architectures for such a library. In this analysis we also need to consider the issue of developing both sequential and parallel versions of the software. A good software architecture is achieved if we minimise the number of distinct library components (packages), and the software complexity of each. How we combine the packages to achieve the problem domain specific concepts will be a blue-print for configuring (i.e., putting together) the application programs from the packages. Good choices here will greatly reduce the software development effort. Both by directly reducing our coding effort, and, more importantly, by identifying reusable components for other tasks within the same problem domain. In the algebraic specification language CASL software architecture can be defined explicitly [7]. First we will introduce the notions of categories and functors, precise mathematical notions for the study of structure. Then we will use these tools to investigate an architecture for the coordinate free language. Lastly we present the Sophus library framework which builds on this insight. 4.1. Algebraic structuring concepts A collection of related mathematical structures, such as the data structures of a programming language, typically form a category [20]. A categoryC is a collection of objects A,B, . . ., and morphisms f : X → Y from objects X to Y , with an associated associative composition rule ◦ on morphisms and a neutral morphism 5Software complexity is a measurement of the complexity of the program text, as opposed to space and time complexity which refer to run-time properties of the software. Software complexity correlates to the cost of developing and maintaining software. 268 M. Haveraaen / Case study on algebraic software methodologies for scientific computing (with respect to ◦) for each object. We will use the categories Prog and Set as our examples. The category Set is from mathematics. It has sets as objects and total functions between sets as morphisms. In Prog the objects are data structures, and the morphisms are all side-effect free algorithms from a data structure to a data structure. The identity morphism and composition rule for morphisms in these categories should be obvious. Functions of more than one argument are defined from special objects called product objects in the category. A comprehensive introduction to category theory may be found in [32], while [56] is a lighter, more intuitive introduction. Categories are related by functors, functions between categories. A functor F : C → D, from category C to category D, maps objects to objects and morphisms to morphisms such that identities and compositions are preserved. It is not to hard to find a functor from Prog to Set that takes the data structures and algorithms (for some specific computer) to the sets of values and mathematical functions being computed. Functors (on objects) are in many ways like C++ template classes [53] or Ada generic packages [5]. These mechanisms will take a data type as argument and define a new data type based on it. We may for instance define a generic list package with a type parameter, such that whenever we instantiate the package with a data structure D, we get a data structure list of D. The functor version of a data type constructor has some additional properties. A list constructing functor L : Prog → Prog takes a data structure D and returns a list of D data structure L(D). But in addition to defining the list data type, it will take any function f : D → E and define an iterator function L(f) : L(D) → L(E). When L(f) is given a list ofD as argument it will perform f on every element of that list, returning a list of E of the results. Likewise we may treat array data structure constructors as functors. For every index type I we have an array constructing functor AI : Prog → Prog which takes an object E and defines array I ofE, the array structure with elements of typeE. But we also get the iterator functions. Given for instance a binary operation + : E × E → E we have AI(+) : AI(E)×AI(E) → AI(E) which adds, componentwise, i.e., for each index i ∈ I , the elements of the two argument arrays, yielding a new array with the summed values. The functor mechanism preserves equational properties of the argument E for A I(E). If E is a ring, such as the reals R, then AI(E) will also be a ring. This is very convenient, but may only be simulated by explicit programming of these functions in current programming languages. Code inheritance, in spite of its popularity in the object-oriented community, is not an important feature in this structuring of software, and should only be considered incidental in the construction of a software system. While interface inheritance for specifications, and other methods combining and relating specifications [50], behave very nicely when properties are added and modified, this is not the case for code inheritance. Code inheritance is a form of code reuse, where the data structure is modified in a restricted way and some operations are replaced and additional operations may be added. But this is only sound if the data invariants between the original and the new module are compatible. Also note that different models of the same abstraction may require unrelated implementations. Consider the ring abstraction. Both real numbers and scalar fields are rings, but they are implemented by very different and unrelated data structures and algorithms. 4.2. Software architecture for PDE problems When designing the software architecture for a domain like the PDE domain, we need to investigate how various constructors, such as the array functors, can be (re)used. A nice example is in the construction of vector fields. We start with the observation that the array functor can be used to generate the value fields over a manifold M. Simply apply AM to the appropriate value domains, such as the reals R, vectors or matrices. We can also use the array construction to define finite vector spaces by the expression A{1,...,n}(R), for appropriate natural numbers n. Actually the vector space implementation algorithms are independent of the ring itself, and will work for any ring R when given the data structure A{1,...,n}(R). We have earlier noted that a scalar field has ring properties. As a consequence, n-dimensional vector fields over a manifold M may be constructed by either of the two approaches: 1. applying the value domain construction to vectors, AM(A{1,...,n}(R)), or 2. applying the vector construction to scalar fields, A{1,...,n}(AM(R)). 6Unfortunately, this is not fully sufficient, as the generic package mechanisms do not allow enough genericity power to let us do this once and for all. We will omit a discussion of the technicalities of these deficiencies. M. Haveraaen / Case study on algebraic software methodologies for scientific computing 269 There does not seem to be any reason to prefer one over the other, and conventional numerical software, as well as many object oriented numerics approaches, uses the first construction. If we study the problem domain further, we see that PDEs contain many distinct differentiation operators. Further, these operators may all be expressed from the partial derivatives on the scalar fields. A more fruitful approach is then to use the second approach as starting point. Instead of building many different constructors for the value domains (vectors, matrices, multi-linear mappings, etc.), we also note that it suffices to build a tensor constructor, which, given certain assumptions, encompass all these. Tensors also give us the building blocks needed to define coordinate free operators. To implement the full scalar field abstraction we expand the constructionAM such that it also includes the definition of partial differential operators, to get a functor SM : Prog → Prog for the construction of scalar fields. The tensor constructor T{1,...,n} : Prog → Prog extends A{1,...,n} with the derivation operations and other tensor operators. The tensor field construction, including all derivation operations, for a manifold M then becomes T{1,...,n}(SM(R)). As noted, we should expect to reuse the array functors in the construction of both scalar fields and tensors. Using the array constructor to implement both the numerical discretisation and the tensor construction allows for a reuse of the array module. But more importantly it allows a separation of concerns when implementing these modules: the array constructor may focus on the data layout pattern, while the numerical modules may focus on the numerical aspects, using the array construction for the storage aspects. The software architecture also implies that we only need to relate to, and thus implement, the discretisation method when we implement the scalar field, and that the vector and tensor field implementations are independent of this choice. If we need to change discretisation method, this will be localised to one module, and not being spread out all over the code, which is the normal case with traditional numerical software. This also provides a route to parallelisation. We will, at the scalar field level at least, have a large collection of data values that may be distributed in a data parallel fashion [10]. Actually, it suffices to provide a parallel implementation of the array constructor to get a parallel version of the whole program. See the accompanying paper [28] for a discussion of this. 4.3. The Sophus library framework The software architecture discussed in the previous section is the basis for the Sophus software library framework. It provides the abstract mathematical concepts from PDE theory as programming entities. Its concepts are based on the notions of manifold, scalar field and tensor field, while implementations are based on the conventional numerical algorithms and discretisations. Sophus is structured around the following concepts: – Basic n-dimensional mesh functorMn : Prog → Prog, for any natural numbern, taking a ringR as argument. A mesh structure Mn(R) is like an array A{1,...,k1}×...×{1,...,kn}(R) with element type R, and includes the general iterator operations. Specifically, operations like +, − and ∗ are iterated over all elements (like collection oriented array operators), likewise operations to add, subtract and multiply all elements of the mesh by a scalar. There are also operations for shifting meshes in one or more dimensions. Operations like multidimensional matrix multiplication and equation solvers may easily be implemented for the meshes. Sparse meshes, i.e., meshes where most of the elements are 0 or have some other fixed value, may also be provided. Parallel and sequential implementations of mesh structures can be used interchangeably, allowing easy porting between computer architectures of any program built on top of the mesh abstraction. – Manifolds M. These define sets with a notion of proximity and direction. A manifold is the index set for a value field. It represents the physical space where the problem to be solved takes place. – Scalar fields SM. They describe the measurable quantities of the physical problem to be solved. As the basic layer of “continuous mathematics” in the library, they provide the partial derivation and integration operations. Also, two scalar fields on the same manifold may be pointwise added, subtracted and multiplied. The different discretisation methods, such as finite difference, finite element and finite volume methods, provide different designs for the implementation of scalar fields. Scalar fields are typically implemented using the basic mesh structures for the data. – Tensors T{1,...,n}. These provide coordinate free mathematics based on knowledge of the coordinate system, whether it is cartesian, axisymmetric or general curvilinear. The tensor module provides 270 M. Haveraaen / Case study on algebraic software methodologies for scientific computing the general differentiation and integration operations, based on the partial derivatives and integrals of the scalar fields. Tensors also provide operations such as componentwise addition, subtraction and multiplication, as well as tensor product, composition and application. The implementation uses the basic mesh structures, with scalar fields as the ring parameter. – Equation administrators. These are abstractions containing collections of scalar and tensor fields with the purpose of building the matrices and vectors used to describe sets of linear equations, such as those needed for implicit time stepping schemes. These matrices and vectors do not represent coordinate free properties of a physical system, but abstract the important properties of linear equations. Equation administrators are also implemented using mesh structures with tensor fields or reals as the ring, as appropriate. Further abstraction levels, such as time integrators [2], may be added to this framework to raise the abstraction level further as even more aspects of the problem domain are investigated. Using Sophus, the solvers are formulated on top of the coordinate-free layer, forming an abstract, high level program for the solution of the problem. The Sophus library framework can be implemented using any object-oriented programming language or any programming language with abstract data types. Ideally the language should have template classes. This includes languages like Ada [5], Clu [36,37], C++ [53], Generic Java [11] and Standard ML [41]. Languages which have the abstraction mechanism but lacks the template mechanism, such as Fortran-90 [1] and Java [21], will be much harder to use. We have chosen C++ for our implementations, since it is widespread, has reasonably good compilers, and has gained some acceptance in the numerical community. 5. Specification, certification and proofs of modules We have developed our presentation as if we could satisfy our specifications, such as that of the ring, when realising them on a computer, e.g., in the form of floating point numbers or scalar field approximations. This is not the case, which is well-known in numerical analysis. The problem is not just that we cannot exactly represent the abstractions on our finite computers, but that the available representations break the fundamental laws of our abstractions. For instance, the machine’s floating point numbers do not satisfy the associative laws Eqs (1) and (3) of rings, laws which are fundamental for the development of linear algebra. This can be remedied if we choose to use computable reals as our abstraction [57]. But this would only touch the tip of the iceberg, as the discretisations we work with only provide coarse approximations to the mathematical concepts, and this seems to be fundamental for numerics [52]. Accepting this situation there is a need to supply the approximate implementations with some kind of information about their inaccuracy. This is routinely done at the level of the implementation, such as in [46]. With abstractions a certificate of the approximation’s characteristics at the level of the specification is more appropriate. The certificate should also include information about storage space requirements and execution time properties. How this can be done remains open, but [55] represents a start. Our proof methods also fall short when moving into this terrain: how do we prove that our implementation satisfies our specification sufficiently well, when it obviously has to break the most fundamental properties? What notions of satisfaction we will need is open and needs research.
منابع مشابه
Method integration: An approach to develop agent oriented methodologies
Agent oriented software engineering (AOSE) is an emerging field in computer science and proposes some systematic ideas for multi agent systems analysis, implementation and maintenance. Despite the various methodologies introduced in the agent-oriented software engineering, the main challenges are defects in different aspects of methodologies. According to the defects resulted from weaknesses ...
متن کاملSEMPA: software engineering for parallel scientific computing
computer science, mechanical engineering, and numerical analysis to develop software-engineering methods for parallelizing existing scientific-computing software packages. To define and evaluate these methods, the researchers implemented a parallel version of TfC, a computational fluid dynamics simulation program. T he wide availability of interfaces such as PVM and MPI enables software enginee...
متن کاملارائه راهکار ترکیبی به منظور بهبود و توسعهی متدولوژیهای عاملگرا
Abstract: Agent-oriented software engineering is developing a new field of computer science in terms of agent-oriented methodologies, systematic approach to the analysis, design, implementation and maintenance of multiple offers. One of the major challenges in the agent- oriented software engineering is that in spite of numerous methodologies have been introduced in this area, there are still s...
متن کاملSoftware Engineering Methods for Parallel Applications in Scientific Computing Project SEMPA
SEMPA is an interdisciplinary project that brings together researchers from computer science, mechanical engineering and numerical analysis. Its central objective is to define software engineering methods for the parallelization of existing large scale software packages in scientific computing. The parallel implementation of an industrial state-of-the-art CFD simulation program, TfC, serves as ...
متن کاملAn Analytical Framework for Evaluating Service-Oriented Software Development Methodologies
Service-Oriented Computing is becoming a paradigm of choice for implementing enterprise-level distributed applications, with a number of methodologies having been proposed to provide systematic guidance for the development of service-oriented solutions. However, presently, there is a lack of well-defined and pragmatic ServiceOriented (SO)-specific methodology evaluation approaches, making it di...
متن کاملA Survey of Portability in Scientific Computing
Portability is a measure of the ease with which software can be moved between heterogeneous computing platforms. Software with a high degree of portability is valuable because it reaches more users. Commercial developers expend a considerable effort to make software that can run on multiple platforms. Scientific programmers have the same motivation to make their software portable, but their req...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Scientific Programming
دوره 8 شماره
صفحات -
تاریخ انتشار 2000